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Abstract 

The problem of estimating a probability density function / on the d — 1-dimensional unit 
sphere S"' -1 from directional data using the needlet frame is considered. It is shown that 
the decay of needlet coefficients supported near a point x £ S" 4-1 of a function / : S d ~ x — > R 
depends only on local Holder continuity properties of / at x. This is then used to show that 
the thresholded needlet estimator introduced in Baldi, Kerkyacharian, Marinucci and Picard 
[2] adapts to the local regularity properties of /. Moreover an adaptive confidence interval 
for / based on the thresholded needlet estimator is proposed, which is asymptotically honest 
over suitable classes of locally Holderian densities. 
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1 Introduction 

Let S^ 1 denote the surface of the unit sphere in R d and let X — {Xi, X2, ■ ■ ■ , X n } be an 
independent and identically distributed sample of n values from some probability density function 
/ : S^ 1 — > R. Our goal is to estimate / from the sample. A classical method for doing this 
is by kernel density estimation, see (SI H2] • I n the past few years functions known as needlets 
have been constructed, which have given us a powerful new tool to tackle this problem. Needlets 
are effectively built on the spherical harmonics to form a tight frame for the space L 2 (S d ^ 1 ) 
of square- integrable functions on S , in such a way that they have a localised projection 
kernel (see [131 US])- Baldi, Kerkyacharian, Marinucci and Picard [5] have shown how to use 
the needlet frame combined with the standard thresholding techniques to construct an estimator 
that achieves the minimax convergence rates (up to log terms) over the usual Besov spaces in 
the LP norms, 1 < p < 00. These Besov spaces contain functions which can be approximated 
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well by spherical polynomials globally on S d 1 in the L p norms, and thus model homogeneous 
smoothness properties of functions on S^ 1 . 

However these results do not address spatially inhomogeneous smoothness properties of /, for 
which there are currently no results in the literature for functions on S' d_1 (although results for 
functions on the real line exist as in [TUJII3])- For example, the density / could be i-differentiable 
except for a single point y where it behaves locally like d(x, y) 1 for some noninteger t' < t, d 
being the geodesic distance on S ,d_1 , which means that the function cannot be i-differentiable 
globally. We thus define / : S^ 1 — > K to be locally t- Holder-continuous at x E S^ 1 if it can 
be approximated locally by a -th degree polynomial with suitable error bounds, as suggested 
by Jaffard [10]. We show that the local needlet coefficients and the approximation errors from 
the corresponding local needlet projections of such functions obey decay properties that reflect 
only the pointwise Holderian regularity properties - a result that does not follow from the global 
needlet characterisation of Besov spaces on S^ -1 obtained in [IS]- This is the analogue of the 
results for wavelets by Andersson [T|. This can be used to prove explicitly that the thresholded 
needlet estimator from [2] is locally minimax-optimal within logarithmic factors. This is the 
subject of our first result, Theorem [5] We note here that these logarithmic factors are probably 
necessary for they also appear (and are necessary) in estimations of densities on the real line [T3] 

A next challenge is to construct a confidence interval for the unknown function / at x € S l . 
One can use Bernstein's inequality to find a confidence interval for each unknown needlet coeffi- 
cient, centered at the corresponding empirical needlet coefficient. To create a confidence interval 
for / centered at the thresholded needlet estimator we create a confidence interval around each 
non-thresholded coefficient and sum the result. This procedure is shown to give confidence in- 
tervals of adaptive expected length. Proving coverage will thus require some assumptions - it is 
well known from Low |14j that adaptive and honest confidence intervals cannot exist over the 
usual smoothness classes. The method indicated by [7] [9] [17] is to assume that the underlying 
function satisfies a further lower bound condition on the local decay of the needlet coefficients. 
Indeed we show in Theorem [3] that the proposed confidence interval is asymptotically honest 
over functions satisfying this condition. However, in contrast with these results, our confidence 
intervals are spatially adaptive. The practical implementation of this estimator is beyond the 
scope of this paper; it is hoped that a further paper will be published on this. 

The format of the paper is as follows: We first summarise the construction of needlets in Sec- 
tion al! Then, we define the local regularity spaces C* M s (x) and show that the needlet coefficients 
of such functions obey appropriate decay properties. We also find a lower bound for the minimax 
error of pointwise estimators for densities in these spaces. Following that, we introduce the hard 
thresholded estimator and use these properties to show it is locally near-minimax-optimal and 
create asymptotic confidence intervals around it. Finally all proofs will be given at the end. 



2 Needlets and Local Regularity Properties of Functions 

2.1 The Needlet Frame 

We first start with a review of spherical harmonics; more details can be found in Stein and Weiss 
[IB] or in Faraut @|. Let L 2 (S d ~ 1 ) be the space of square integrable functions on S d ~ x with the 
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natural inner product 

< f,9 >= J f{x)g{x)dx 

S d-i 

where dx is normalised such that J S d-i dx equals the Lebesgue measure ujd-i of S 1 . Let further 
^fc(S ,d_1 ) be the space of spherical harmonics of degree k. Then: 

L 2 (S d - 1 ) = @H k (S d - 1 ) 
k>a 

with convergence in L 2 (S 1 ). If we denote the projector kernels onto the spaces {Hk}k>o to be 
{Z k (-, -)}fc>o, wc thus have: 

f(x) = J2 Z k (x,y)f(y)dy 

with convergence in L 2 {S d ' r ). It is well known that 

Z\x,y) = §±^Pt^\x.y) 

where Pj. d 2 ^ 2 is the corresponding ultraspherical (or Gegenbauer) polynomial and (•,•) is the 
Euclidean inner product in M. d . Since for fixed x, the kernels Z k (x, •) are themselves spherical 
harmonics of degree k, we have: 

Z k (x, y)Z m (z, y) dy = 6 km Z k (x, z) (1) 

S d-i 

The rest of this section follows Narcowich, Petrushev and Ward [15l [16] who first showed how 
to construct a needlet frame from the spherical harmonics. We start with a Littlewood-Paley 
decomposition. Let a be a decreasing C°° function on R + , compactly supported on [0,1] such 
that a(x) — 1 when x € [0, ^] . We also define b = a(f ) — a(x) which is compactly supported on 
[i,2]. We define: 

Mf)(*) - I A 3 (x,y)f(y)dy 



A 3 (x,y) :=Y j a{—)z k {x,y) 



k 



B 3 if)(x):= / B J {x,y)f{y)dy 
Js- 1 - 1 

B 3 (x,y) ; = W|-)z fe (*,y) 



k 



It is obvious that Aj(f) converges to / in the L 2 (S d 1 ) norm. More importantly, the kernel A 3 
can be shown to be localised, see [TS], Eqn 1.2: for all m > there exists c m > such that for 
all x,y,j, d: 

c 2 3i - d -^ 
{l + Vd{x,y)Y 



\Mx,y)\< j, i Z J/ _ ^ (2) 
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where d(x,y) is the geodesic distance between x, y € S . This localisation implies that the 
error | A, (/) (a;) — f(x)\ decays exponentially in j for certain classes of functions, see Proposition!]] 
later on. Now, if we define: 



C 3 (x,y) := Y,y(£)z k {x,y) = £ ^(-)z fe (x,y) 

k 2J'- 1 <fc<23 + 1 

Then, by repeated usage of equation [TJ we can split Bj : 

B j( x >y)= / Cj(x,u)Cj(y,u)du 
Js- 1 - 1 

Finally, we notice that for fixed x and y, u —> Cj(x,u)Cj(u,y) is a polynomial of degree 2 J+2 . 
By the quadrature formulae in Section 4.2 in [16] and Theorem 2.8 in [IS] , there exists c > 0, 
such that for all j, there is a set of points = {xj., 2:2 ■ • ■ a^fe} with <i(a;j, Xj) > c2~ J for 7^ Xj 
and a set of positive values indexed by the set {A^},^^ such that for all polynomials / of degree 

2J+2; 



/ f(x)dx=J2 \J(r)) 
Js*- 1 jrz 



Hence, we obtain the following expression for Bj(f): 

B j (f)(x)= f B j (x,y)f(y)dy= £ X^C^rf) f C J (y, V )f(y)dy 
Jsd - 1 ^n, Jsd - 1 

This motivates the definition of needlets ipj^ = yJX^Cj (.,rf) and if (3i V =< f,ipi n >, then: 



Mf) = + £ Btif) = + EE ^ 

d 1 z=0 Wd 1 i=0 -qeHi 

From Corollary 5.3 on [TB] we know that the ipi V (xYs are also localised; for all m > there exists 
c m > such that for all 77, y, i, d: 



^ M - (l + *d(y, V )r - (3) 



We will need the following lemma, which follows from the previous equation: 
Lemma 1 There exists a constant C± such that for all i, 

E \^v(y)\<c^ d - 1)/2 
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2.2 Regularity spaces of functions 



We will use the standard definition of Holder continuity for t € [0, 1]: 

Definition 1 Let t € [0, 1], S, M > and x e S . VFe say f is t- Holder- continuous at x with 
parameters M, S if: 

sup !/(;)-/(*)! < M 

y eB(x,s), y ^x d{x,y) 
We also say that f € ^(i). 

However, the standard way for defining Holder continuity for t > I is to use differentiability and 
charts. This has proved awkward to work with and so we have gone with an alternative. We first 
note that functions which are smooth can be approximated by polynomials with quantitative 
error bounds. If we have a function g : K — >• M which is i-Holdcr-continuous in a 5 ball around 0, 
then by the mean value theorem (or Taylor's theorem), for all x £ B(0, 5) there exists c e [0, x] 
such that: 

,(„> - m + p'(o)* + ■ • • + + '"'y^ 

This suggests to define a Holder norm locally in a ^-neighborhood of as: 

gW(x)-gW(0) 



\g\\ct(B(o,5)) = max \g l (0)\+ sup 
i=o,i,..., L*J xeb(o,s) 



c*-L*J 



We can thus write g(x) = -P(x) + R(x), where P(x) is a polynomial of degree [t\ bounded 

by ||#||c*(b(o,<5)) Ya=o ^ IIS'llc t (B(o,5))e |x| and the remainder R is bounded on B(0,6) by 
a;t ||fl , ||c*(s(o,5))- This definition can easily be generalised to d dimensions, and thus motivates 
the definition for functions / : S^ 1 — > K to be locally t- Holder-continuous: 

Definition 2 Let t, S, M > and x e S" 4-1 . W^e say / is t- Holder- continuous at x with param- 
eters M, S if there exists a spherical polynomial Pf := Pf iX of degree [t\ such that: 

sup \m_im <M 

\\Pf\\oo<M 

We also say that f € C f M s (x). 

For further justification of the above definition, we consider a function / which satisfies the 
traditional definition: 

Definition 3 Let U C S^ 1 be a neighbourhood of x and let C : U — > be a chart. Then, f 

is t- Holder- continuous at x if f o C _1 is t- Holder- continuous in a neighbourhood of C(x) for all 
C. 
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We can show that / satisfies Definition [5] for some M. Without loss of generality, let x = 
(1,0, • ■ • , 0) and let T x be the tangent space attached to x; in this example, this is the plane 
such that xi = 1. We then have a local chart from the eastern hemisphere (x\ > 0) to 
T x : C[x\, X2, ■ ■ . , xj) — (1, X2, . ■ ■ , Xd)- By our hypothesis, / = / o C~ l is locally i-H61der- 
continuous at B(x, 8) with 8 and M depending on /. Now if we extend / to g on M(x, 8) G M. d 
by g(xi, X2, ■ ■ ■ , Xd) — /(l, X2, ■ ■ ■ , Xd), g too is locally £-H61der-continuous at B(x, 8) with 8 and 
M depending on /, so using Taylor's theorem, g can be estimated in this ball by a polynomial 
P of degree [t\ such that \g(y) — P(y)\ < Md(x i y) t . Now we use the fact that all polynomials 
restricted to S^ -1 can be written as a sum of spherical harmonics, see [18] . and this yields a 
Pf, which is bounded because it is a continuous function on a compact set. This implies that 
Definition [5] is sensible. 

In contrast to spherical harmonics, the needlet frame allows us to describe local regularity prop- 
erties of functions / : S^ 1 — > R by the decay of the 'local' needlet series representation of /. 
We give three instances of this fact, all of which shall be useful in what follows. We emphasise 
that these facts, although not difficult to prove, do not follow from the characterisation of Besov 
spaces in [T5] . 

Proposition 1 Let x € S^ 1 and f £ C l M g(x). Then there exists C2, C3 only dependent on 
M, ip, 8, d, t, I l/l |oo such that for all i, j: 

\Aj(f)(x) - f(x)\ < C22-* 

l/M^(*)l < C 3 2- u 

Also, let K > be fixed. Then there exists C4 only dependent on M, -0, 8, d, t, ||/||oo, K such 
that for all i and r\ G Hi that satisfies d{x,rf) < K2~ l , we have: 

|A„| < c 4 2-^ 2t+d -^/ 2 . 



We also state the minimax optimal rate for density estimation for the spaces we have defined: 



Theorem 1 Let t > 0, r\ G S d 1 and let f : S d 1 — > [0, 00) be a probability density. Further, let 
X = {X\, X2, . . . , X n } be an independent and identically distributed sample of n values from f 
and J- n = {/ : S^_i — > [0, 00)}. Then there exists c, 8, M > such that: 



liminf inf sup Ey 



i^+^(f(X 1 ,X 2 ,...,X n )-f(r,)) 



> c 



3 Localised Density Estimation by Needlets 

We first show that the linear needlet estimator fj 1 : 

^ J—l ^ n 

fj = ^ E hn^ivi where j3 ir) = -^ip ir ,(Xk) (4) 

i=o r,eHi n fe=l 

is minimax-optimal over the regularity space of functions C l M s (x) if we are allowed to pick J as 
a function of the regularity t. We will need the following proposition to show that the estimated 
coefficients are not too far from the true values. 
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Proposition 2 Let x G , f : S^ 1 -> R be bounded. Further, let, X = {X u X 2 ,..., X n } 

be an independent and identically distributed sample of n values from some probability density 
function f, and ff 1 as defined in Equation^ Then there exists a constant C5 such that for all 
i, j: 



E(\j3 iv - /3 iv \) < 



n 

.-1/2 



nff{x)-A 3 {f){x)\ < c 5V 4mu2 j(d - 1)/2 «- l/2 

Furthermore, let 2 t(d ~ 1) < v > and n (v) = max(14i>/(3 v /^T7), ||/|| oov ^TT). Then: 



Using this result together with Proposition [T] it is easy to see that 2 J ~ n 2t + d - 1 balances the 
bias and variance terms, and that the resulting local error of estimation satisfies 

E\fJ(x) - f(x)\ = 0(n- t '<> M+d -V) if / G Cl u (x). 

This corresponds to the local minimax rate of estimation at x £ S^ 1 . This needlet estima- 
tor, although minimax optimal, requires the knowledge of t, which is typically not available. 
Circumventing this knowledge is the subject of this paper. 

Following Baldi et. al in we define the hard thresholded needlet estimator as 

i-0 V £-Hi 



A first main result is to show that the thresholded needlet estimator is locally minimax optimal 
within log-factors. Note that the only unknown quantity required for the construction of the 
thresholded estimator is ||/||oo, which can be replaced by ||/j||oo in practice, see for instance [6]. 



Theorem 2 Let n > 2 and X — {X\, Xi, . . . , X n } be an independent and identically distributed 
sample of n values from some probability density function f : — > R, / G C l M s (x),x £ S . 

Let ffT oe defined as in equation^ If2 J ( d ~ 1 ' ~ an d k = 2 max(l4/3 1 yw ( i_i, 1 1/| 1 00 y^d- 1 ) , 
then there exists C = C(M,tp y S,d,t, ||/||oo) such that: 



sup E\f»£(x)-f(x)\<C 



log n 



We now aim to utilise the local adaptation property of the thresholded needlet estimator to 
construct confidence intervals. This means that given the sample X±, X2, ■ ■ ■ , X n we want to 
choose a data-driven Gj, a (x) such that: 

1. The confidence interval [fj^(x) — 1,01a j i0l (x), ff^(x) + 1.01a j vCt (x)} is asymptotically 
honest with level a for every x G S l : 

lirninf inf ^ inf ^F fn (f(x) G - lM&j, a (x), f?J(x) + 1.01&j, a (x)]) >l-a 
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2. The expected size of the confidence interval shrinks at the right rate in n, up to log n terms. 



However, this will be a fruitless task, as the results of Low in P3] indicate that no honest 
confidence interval C(x) can adapt to the local smoothness of /. This is due to pathological 
functions which masquerade as having a higher Holder exponent than they actually do. With 
this in mind, the method indicated in [3 [9j [17] is to choose subsets C\ M (x) C C\ M (x) to cut 
out these pathologies so that the smoothness parameter t becomes identified. In the case as 
treated by Kerkyacharian, Nickl and Picard in [TT], we know that if / € C t (S d ~ 1 ) were globally 
t-H61der, then — /||oo < A^ - -'*, hence it is natural to form (7* C C by admitting only 

functions / € C^S"* -1 ) which satisfy 

^i2- Jt < \\A j (f)-f\\ 0O <D 2 2^ t , 

and there it was shown that 'typical' Holder functions in C*(S' d_1 ) on the sphere satisfy this 
condition. There are more compelling results in wavelet theory - Gine and Nickl in [7 showed 
that quasi-every function in C*(R) satisfies this lower bound condition, whilst cutting out mas- 
querading pathologies. However, this condition is a global property and is thus not suitable for 
our purposes. We need to find a local analogue of this self-similarity property. 

First, recall that at each level i, the needlets have a maximum height of order 2( d ~ 1 '/ 2 . Since 
we are only interested in the neighbourhood of a point x, we only consider needlets which have 
this maximum height. We thus conclude that the centers rj of these needlets must be close to x 
by Equation [31 

V^(2)2^ (d - 1)/2 -> as 2 l d(x, ri) ->• 0, 
hence d(x,rj) < A2~ % for some A > 0. Now, by Proposition Q] above 

1/3^1 < C2-^ 2t + d - 1 >/ 2 

holds for those r\ GHi. The following lower bound condition thus becomes natural: 



Condition 1 Let f € C l M s (x) . There exists A, B > and a sequence < p n < 1 such that: 



max|A 7) |l^ (:c) > A2i( ,- 1V2 > B2- i( - 2t + d - l V 2 



where 2 i( - 2t+d -^ 



We note that from Lemma [H] below, l^ ilj ( K )> J 42 i ( d - 1 )/ 2 = 1 f° r some A if d(x, if) < C$2~~ z , hence, if 
the quadrature were dense enough, then there are r\ which satisfy this condition. This sequence 
p n is the needlet- analogue of the condition used in [T7] , and it measures the loss of adaptation. 
We should note that for example, each component of the sum 

f(x) = — + V a 3 Z 21 (rj, x) 

is orthogonal to all but one of the needlet levels; hence the ctj can be chosen such that Condition 
[T] holds for any sequence p n . 

Proposition [T] suggests to take j3i v ± ^o)^/^ as a confidence interval for each non-thresholded 
needlet coefficient (3i V . Our second main result is now the following theorem. 
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Theorem 3 Let n > 2, and let X = {X±, X2, ■ ■ ■ , X n } be an independent and identically dis- 
tributed sample from some probability density function f : S^ 1 —> R and let: 

1 - 7 - 1 

fjI Kl { x ) = — + X!^ 1 |/3„,|>2« 1% /i^r^ i ')( a; ) 
J-i 

i=0 V 

with K( v ) = max(14u/(3^/Wrf_i), H/lloovA^d-i)) w such that n~ w — ^ and J such that 2 J ( d ~ 1 ^ ~ 
Then if f £ Cl /IS (x) satisfies Condition^ such that 7„(p„ log n) (!i_1)/(2(2t+<i_1)) diverges 
and p n \ogn converges to 0, then there exists C dependent on M, ip, 6, d, t, ||/||ooj a such that: 

V log n / 

liminf P/n(/(x) e [ffJ Kl (x) - I.Ola j, a (x),fJJ Kl (x) + 1.01aj, a (x)]) > 1 - a 

Remark 1 Inspection of the proof of Theorem [5| shows that the confidence interval in Theorem 
2 is asymptotically honest over f £ Utg[ r r]{{^m s( x )} f) {Condition\J£A, B , p n )}} . 

Remark 2 For simplicity, we have chosen w to accommodate all 0(n) needlet coefficients. In 
practice, we suggest that w be chosen such that n~ w = Q where # is the number of non- 
thresholded coefficients, which will decrease the size of the interval without affecting the theoretical 
results. 

Remark 3 To ensure that the 'large ' needlet in Condition [7] is not truncated, we require that 
\Pi v \ > B2-^ 2t+d -^/ 2 = be larger than 2n 1 ^J 1 -^. Hence, \ognp n has to converge to 0. 

This is the loss of adaptation in hard thresholding. Nonetheless, if we assume the function is 
exactly self-similar, we can pick p n = (logn)~ 2 , say, and allows us to pick j n = logn gives us a 
confidence interval whose expected size is minimax- optimal up to logn terms. 

Remark 4 For practical implementation, the only unknown quantity we require is ||/||oo This 
may be replaced by ||/j||oo in practice, see for instance the proof of Theorem 2 in fSjj. However 
it must be stressed that although the constants given here are sufficient for the theoretical results, 
however practical implementation may require better constants, and this is beyond the scope of 
this paper. 

4 Proofs 

Our main focuses here are Theorems [5] and [3] so we will start by proving these. We will assume 
Lemma [1] Propositions [T] and [2] in these sections. We will then prove these three statements. 
Finally, we will give proofs of the lower bounds for ip and the size of the confidence intervals. 



logn 

n lft„l>2 K i 
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4.1 Proof of Theorem [5] 

Proof of Theorem [2} Let 2 Jl ~ (j^g^] 2t+d * in such a way that J > J\. We have that: 



i\f^J(x)-f(x)\<E\fX(x)-f(x)\ 
J i-i 

+ E 



■E 



i=0 r) 
J-l 



z=Ji ?) 



By Propositions [T] and [H we have: 



E\ftAx)-f(x)\ < |A Jl (/)(x)-/( a; )|+E|/f i (x)-A Jl (/)(a ; )| 



<(c 2 + c 5 ^4LnU) 



Now we bound the second term using Lemma [TJ 



logn 



E 



Ji-i 

E 

i=0 jj 



J 1-1 

^ EE 



< Ci 



logn ( n \ 2t + d ~ 1 
log 



Now, we deal with the other term. We have: 

j-i 



E 



E E^ 1 ift„i>K V /s^^ , '( a; ) 



J-l 



• EE l A " 1 iA,i>/,v^^" (a!) 

i=Ji r; 
,7-1 

+ E E E |^ ~ Pi^lp^K^rXgrrtiriix) 



i=Ji 7) 
J-l 

+ E EE|(^-^) 1 

i=Ji n 



10 



First, we use Proposition [TJ 

,7-1 



j-i 



i=Jx v 



E E E |/V|^|> K vs^(z)| < E E ia^wi 

i=Ji »7 
,7-1 



i— Ji 



c 3 



log 77 



We then obtain the bound for the second term using Propositions Q] and [2] 

7-1 7-1 



i=,7 J) 



lft„l>i 



i=7i J) 



lost 



2 V n 

2 — Jl 

<C 3 



s 1/9 i (d-l)/2 

2 bJ d -i) 1 n 2t+d-i (logn) 2t + d ~ 1 



Using the Cauchy-Schwarz inequality and Lemma [T] we deduce: 



,7-1 

eEEI^-a^ 

i=Jl V 
.7-1 

• -EE |(&» " An)^) 1 ^-^^^^ 

< EE 1^)1 v E 

< EEi^wiVn 



ft /log 



2 V ?i 



i—Ji 



V n log n 



So by combining these inequalities, we have the result. 



4.2 Proof of Theorem H 

Proof of Theorem \3[ We start off by showing that crj iQ is of the right size. We first note that 
ttj is bounded above by 1 + log 2 a, and hence k w is also bounded above, so we will omit it from 
what follows. We will also drop the 7„ which appears on both sides. We split sum into three 
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bits using the triangle inequality: 
J -i 



E EE 

i=0 Tj 



logn 

n l^i»7l> K A 



Ji-i 

i=0 77 

+ E EE 

.7-1 

+ EE 

i=Ji 77 



logn 



^77 0*0 



logn 



„ 1 |ft,l>2K 1 V^,|ft,|<Ki % /li|i' ?/ ' 4 ^ 2; ) 



logn 



where 2 Jl = \ r^j ■ We call the terms D + E + F. 

We bound the first term using Lemma [T] 

D < ^^ g^VVogn < Ci 2^WM5|n ^ g / n y 
~ ^ 01 1 ^/n Vlogn/ 

We bound U using Proposition [2] and Lemma [T] 

,7-1 



*<E 

7=/i 
,7-1 



log n 



iy\ — 



Ki 



logn 



v 



/=./i 



n 



< 2Cin 



- x (— V 

V log n / 



2t+d—l 



We bound F using Proposition [T] 

,7-1 



^EE 



log n /3 ir) 



--1pir,(x) 



< 



< 



j-i 



14 



-EE ift^o^i 



i=Ji 7) 



14 



f— y 

V log n / 



Summing these inequalities, we have the result. 



Now we show asymptotic coverage. We aim to show that for all / € C\ M (x) satisfying Condi- 
tion ED 

liminfP/nd/^) - /f 2 T Kl (x)| < 1.01<7 7 , Q (x)) > 1 -a 



Define the event E n as follows: 

E n = \ Vi < J - 1, n, 



/^Z77 /^Z77 



< 



logn 
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We also split / into two terms fa + fa: 

j-i 



f= ( ^ + E E i \^\>^^pmx) 

i=0 i)6M, 



' J-l 



EE 1 

»=0 776"Hi 



|/3i,|<2Ki 



i=J ri£-Hi 



Then the following are true: 

1. There exists TV such that for all n> N, P(E„) > 1 - a 

2. Under E n , \fa - f?J Kl (x)\ < aj >a (x) 

3. Under E n , there exists a constant C dependent on M, ip, 8, d, t, ||/||oo, ot such that I/2I < 

V log n J 

4. Under E n , there exists constants N and C dependent on ||/||oo and A, B, p n from Con- 
dition □ such that for all n > N, <tj, q (x) > C'j n ^ p n (\ogn)( d - 1 )/( 2t + d - 1 ) 

Proof of part Q3 By Proposition [51 we know that 



> 



logn 



< an 



Now, \Hi \ = C2 i ( d - 1 ), so J2i=o W\ < C2 J{ - d ~ 1 \ and thus we have: 



Vi < J - 1, r?j, 



Pirj ft it] 



< 



logn 



logn 



Taking log N ~ C yields the result. ■ 
Proof of part [2j We can rewrite |/i — fj2 Kl (x)\ as 

,7 

/1 - fjJ Kl (x)\ < E E 1 |,3 l j>2« 11 /S: 

This is clearly smaller than & J>a (x) = Y,i=o Y, v eu z i >2rei , 
E n m 



\^i V (x)\ 



K w Ipin {x) 



on the event 
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Proof of part [3J Let 2 Jl ~ 

\M< 



log n 

j 1-1 



such that J \ < J. On E n , we have: 



E E Vk^+^v^ 3 "^"^ 

i=0 rfG-H; 



J-l 



EE 



E E Piv*Pin( x ) 

i=J r/eHi 



We bound the first term by Lemma [T] 



Ji 



i=0 r\ 



^ ^ i log ^ ^ ^ 2t + d — 1 



logn 



We bound the second term using Proposition [T] by: 

j 

E EM^)i<c a 

i=./i + l »; 



logn 



Finally, by Proposition [TJ the last term is bounded by Ci2 Jt = C^n d - 1 . Summing the 
inequalities gives us the result. ■ 

Proof of part [4j Let i and rj be the arguments in Condition [TJ that is: 

IAi)l i ^ rJ (x)>A2'<< i -i)/ 2 > B2 H 2 *+ d - 1 )/ 2 

Now, from the definition of i, |/3^| > _B(np„) -1 / 2 . Since p„ logn converges to 0, there exists N' 

depending only on B, p n , ||/||oo such that for all n > N' , B(np^) -1 / 2 > (2k\ + K w )^fn/\ogn 
(recalling that bounded above by a fixed constant). Thus: 



> 



AK w7n ( Pn logn)^/^ t+d -^ 



logn 



which together with k w > 1 gives us the result. ■ 

We can choose N large enough such that under E n , 0.01aj, a (x) > \f2\- Thus, by the trian- 
gle inequality, under E ni O.Olcr j, a (x) > \ fi — fj2 K ( x )\i an d using the triangle inequality again 
with part [5] gives asymptotic coverage. ■ 
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4.3 Upper bounds on the L 2 and L°° norms of ip. 



hi 



We will need the following bounds on the L 2 and L°° norms of ipi V to prove the propositions and 
lemma. 

Lemma 2 For all -q £ Hi, ||^|| 2 < 1 and \\ip iv \\oc < 2w d l 1 { 2 2 i ( d ~ 1 '/ 2 

Proof. Since x — > {J2k<2 i + 1 Z k (r],x)) 2 is a polynomial of degree 2 l+2 , by quadrature, 

/ ( E zk ^A dx = E A n ( E ^(^)) >\ ( E ^ fe M] 

d 1 \fe<2>+ 1 / ijeWi \fc<2 i + 1 / \fc<2>+! / 

However, recalling equation [1] 

f I zk ^ x n dx = E / z k {r,, x fdx= zh (v,v) 

Jsd - 1 \k<2^ ) hi*V Js '~ 1 fe<2' +1 



Thus A, < (Efc<2*+ 1 zk ( 7 hV)) 1 - Recalling %l>i n {x) = V^E2*-i<fe<2*+i \j h {^) zk ('n,x), si 
Z n has a maximum of N*^ 1 ) - Ci^ 3 ) ) w <7-i at we nave: 



since 



^z?7 1 1 2 — 



H^Hoo <>/\ E Z k ( V ,r])< J2 Zk ^,v) 

2 ; - 1 <fc<2 i + 1 Y fc<2 i + 1 

Now, by a telescoping sum: 

E zk M = [[ d _ x ) + [ d _ 1 

fe<2'+! V V 7 V ' ' 

s (d-l)I d " x 

Thus Halloo < 2w d L 1 { 2 2^- 1 )/ 2 ■ 

4.4 Proofs of Lemma Q] and Proposition [2] 

Before we prove Lemma [TJ we need Lemma 6 from Baldi et al.[2]: 

Lemma 3 There exists C% such that for all y and i, Ejjgw (1+2*55 n)) 3 — ^ 6 
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Proof of Lemma [TJ We use Lemma [3] together with the bound for \4>i v {x)\ in equation [3] 



which gives the result. ■ 

This leads us on to proving the first two parts of Proposition [2j 
Proof of the first two parts of Proposition [2} Since HV^Ib < 1: 



j-i 



E\fJ(x)-Aj(f)(x)\ <J2J2 E \&n-fov\\i>iv(x)\ < 

i=0 r) 



So in fact, C\ and C5 are the same constant. ■ 

To prove the third part, we need to invoke the well known Bernstein's inequality: 

Lemma 4 (Bernstein's Inequality) IfYy, . . . , Y n are mean zero independent and identically 
distributed random variables taking values in [— c, c] for some constant < c < 00 , then 



k=l 



> u < 2exp 



2nEY{ + (2/3)cu 



Proof of the third part of Proposition [2j Now if Yfc = ipi V (Xk) — then Y^k=i = 



n(0 ir , - P iv ) and E(F 4 ) = 0. We have E(Y fc 2 ) 



< 



bounds: 



EYi 2 < 



and thus we just need to find the following 



c < ||n|U < 2||^(a:)||oo < Au^i^W* 
Hence, by Bernstein's inequality: 



fe=i 

< 2exp 

< 2exp 

< 2n- v 



> Kynlog 



K 2 nlogn 



2n||/|U + (8/3)^ 1 / 2 K% ATfog^(2»(^i)/2) 
MvKo^J^nlogn/S \ 



2riKU> d l{ 2 + (8/3) 



1/2 



using the facts 2 l ( d 1 ) < and k = max(l4?;/3 1 /w^T, 1 1/| | ooV / w^T) freely. 
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4.5 Proof of Proposition Q] 



We will need the following integral bounds: 

Lemma 5 There exists a constant C such that for all t > 0: 



y^ (i+2M*,2/)) d+t y - 



1 



d -i (1 + 2id{x,y)) d+t 



l d(x , v)> sdy<C6- t 2^ t+d -V 



Proof. We have an integration formula for zonal functions on S^ -1 , see, e.g., Proposition 9.1.2 
in Faraut [I]: If / : S^ 1 — > K + be such that there exists a point xq £ S^ -1 and a function 
F : R + —> R + such that / has the representation f(y) = F(d(xo, y)). Then if 9 = cos -1 d(xo, y) 
is the angle between xo and y, one has 

/ f(v) d y = 1" F ^ sind_1 9d9<C T F(9)6 d - 1 dS. 

Js*- 1 V 71 " 1 (2/ Jo Jo 

By this integration formula we have: 

d(x, y y r e d +t- 2 

d -! (1 + 2H{x, y)) d +* y ~ J Q (1 + V9Y+t 

/•oo „.d+t~2 
<C2 -j(t +d -l) / « ^ 

Jo (1 + 

/■oo 1 

< C 2-^* +d - 1 ) / ; du = C2-^ t+d - r > 



(1 + u) 2 



using the substitution u = 2 J 9. Similarly: 



1 



qd-2 



<C2 -j(d-i) I l _^ du<C 2-^ t+d -^8- t 



*-i (1 + ^d(a, y))^ 4 " 1 dy " C X (1 + 2^)^-i ^ 

/•oo d-2 
<C2 -.(d-l) / — - rf M 

A,, (i+tt)«n-*-i 
1 

completing the lemma. ■ 

Proof of the first part of Proposition [TJ We first note that since Aj(x,y) is bounded 
in equation [21 we will only need to consider those j such that 2 J > [t\ and ignore the finite 
number of j which do not fulfill this property. We have a polynomial Pt of degree \_t\ such that 
\f(y) — Pf(y)\ < Md(x, y) 1 in B(x, S). Hence, recalling that Aj reproduces polynomials of degree 
< 2 J we have: 

\Mf)(x) - f(x)\ < \MP f )(x) - P f (x)\ + \A 3 {f P f )(x)\ + \(f- P f )(x)\ 
= \Aj(f P f )(x)\ 



d-1 



Aj&yXf-PfMdy 
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because (/ — Pf)(x) =0 and since j is large enough such that Aj(Pf) — Pf. We then split this 
integral into two parts, that over B(x, 5) and that of its complement, and deal with each part 
separately. On B(x,S), we have a bound on the size of (/ — Pf)(y); together with the lemmajSJ 
we obtain: 



(x,S) 



Mx,y)(f -P f )(y)dy 



< i \Mx,v)\\(f-Pf)(v)\dv 

B(x,S) 



" l sd , (]+2:><H 
< c d+t MC2-^ 



Md{x, y ydy 



Similarly: 



B(x,S) c 



Aj[x,y)U-Pt){3l)dy 



< (M+H/IU) / \Aj{x,y)\dy 

Jb{x,S) c 

< (M + H/IU) ^^l lw dy 

< (M + \\f\\ 0O )C5-*2-* 



Summing the inequalities gives us the result. ■ 

Proof of the second part of Proposition [TJ We first note that since *pi V (y) is bounded 
in equation [21 we will only need to consider those i such that 2 2_1 > [t\ and ignore the finite 
number of j which do not fulfill this property, we have J sd _ 1 ipi V (y)Pf(y) dy — so j3i V — 
Jgd-i 4'iri(y)(f — Pf)(y) dy. We split the integral into two parts, that of around B(x, S) and that 
of its complement, and deal with each part separately. On £?(x,<5), we have a bound on the size 
of (f-P f )(y): 



E 



B(x,S) 



il>iT,{v)i>iv{x){f - p f)(y) dy 



< 



< 



E 
E 



B(x,6) 



\4>iv(vWir,(*)\\(f-Pf)(y)\dy 



Cd+t+3 



2 i(d-l)/2 



S 4-i (1 + 2 l %, r/)) d +* (1 + 2 l d{x, r ? ))<«+*+3 



Md(x, yf dy 



< c d+t c d+t+3 M / E 77 

is"- 1 ^ U 

< c d+t c d+t+3 MC 6 C2- u 



2 i(d-l) 



„ + *d(x, r,)) 3 (1 + 2*d(x, y))^ 



d(x, yf dy 
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using the fact that (1 + |x|)(l + \y\) > 1 + \x\ + \y\, 1 + 2 i d(x,<q) < K and Lemmas and [5] 
Similarly, 



E 



< (M + 
<(M + 

<(M + 

< (Ml 



<5) c 



i>iv(y)ipiv( x )(f - p i)(y) d v 



<Y] / \ipiv{v)^iv(. x )\ d y 

- Jb(x,S)° 

v f cj+t-iZ^W c d+t+2 T^m diXiy)>s 

(l + 2' i %,7 7 )) d +*- 1 (1 + 2*d{x, v )Y+t+2 y 



2*(d-l) 



)Cd+t - lCd+t+2 ^(l + 2*d(x, V )r j sd -, (l + 2*d(x,y)y+t-i ld{x - y)>5dy 



)cd+t-iCd+t+2CeCS *2 



Summing the inequalities gives us the result. ■ 

Proof of the third part of Proposition [TJ We first note that since ipi V {y) is bounded 
in equation [31 we will only need to consider those i such that 2 4_1 > [t\ and ignore the finite 
number of j which do not fulfill this property, we have J sd _ 1 ipi ri (y)Pf(y) dy = so f3i v 
| ... : ipiv(y)(f ~ Pf)(y) dy- We split the integral into two parts, that of around B(x, 5) and that 
of its complement, and deal with each part separately. On B(x,6), we have a bound on the size 
of (/ — Pf){y); together with the bound for ipiriiy) m equation^ we have: 



B(x,S) 



i> in (y)(f-P f )(y)dy 



< 



Ct+d 



2 t(d-l)/2^ _j_ iy+t 



S d-i (1 + 2 l d(y, + 2 l d{x, r))) d+t 

2 *(d-i)/2 



Md(x,yYdy 



<Mc t+d (K + l ; 

J S d-i (1 + 2 l d{y,x)) d+t 

< Mc t+d {K + i)d+t C62 -i(2t+d-i)/2 



d(x,y) dy 



using the fact that (1 + |x|)(l + \y\) > 1 + \x\ + \y\, 1 + 2 i d(x,ri) < K and Lemmas [3] and [5] 
Similarly, 



B(x,sy 



Tpiv(y)(f-Pf)(y)dy 



< (||/||oo+Af) / \iP iv (y)\dy 
B(x,sy 



< 



< 



2*(d-l)/2 { K + 



t-l 



" }Ih ''' ' 1 Ib M . ((1 + 2*%, //.).)( I .-•>',!(.,■. „))•<+< 

+ M)cd +t -iC8- t 2-^ 2t + d - 1 ^ 2 



— d v 



Summing the inequalities us gives the result. ■ 

For the lower bounds in this paper, we will require the following results: 



4.6 Lower bounds on the size of tp^ near rj 

Lemma 6 Let the needlets be constructed such that |a(7/4)| > c a > and |A^| > c\2^ l ^ d ~ x ^ 
for rj G Hi. Then, there exists CV,Cs depending on c a , c\, d such that for all x such that 
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d(x,rj) < C$2 1 and for all i, r\ € "Hi.' 



Proof. We first show that this is true for x — rj. Recall from the proof of Lemma [5] that: 



fc<2 ; 

Hence: 



d + T - 1\ (d + 2 l - 2 

d-1 J + V d-1 



2 i - 1 <fc<2 i + 1 

>v^2-^- 1 )/ 2 2 ^(,,,) 

2 i <fc<(7/4)2 i 

> C2 - ((d -i)/2 /A* + (7/4)2* - rj + /d + (7/4)2* - 2 \ + 2' - 1 \ _^d + 2* - 2 

> a2-*C- 1 )/ 2 (3/4)2* ((" + (7 / 4) f " 2 ) + ( d+ (7 /_ 4) f " 3 )) > 2C 7 T^ 
using the fact that: 

a+ c b )- Q =^((a + 6)(o + (6-l))...(o+(6-c))-6(6-l)...(6-c)) 

> 4(W(a + (6 - l))(a + (6 - 2)) ... (a + (6 - c))) 

a + 6- 1 
c- 1 

Now, we will show that Z k (x,rj) > Z k (rj,rj)/2. Since d(x,r]) < Cs2~ i , thus x.77 = cosd(x,r/) > 
1 - 9^1 m We have: 



2fc + d-2 (d - 2 )/2 n , Cj2- 2 * 2fc + d-2 d (d _ 2)/2 

-(d-2)^_i fe (ij 2 ^(d-^idx * 

>£ fe M)(i 



r>2o~2i d p( d ~ 2 )/ 2 /VV 

2 p, ( ^ 2)/2 (i) ■ 



Therefore, to complete the proof, it suffices to show that we can pick Cs such that for all 

2 l ~ 1 <k<2 l+1 , 

2 P^ 2 (l) 2 
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But -£;Pk( x ) = 2aP^ 1 (x) which is another Gegenbauer polynomial, and these have a maximum 
at 1, and so: 

2 ie[04] pio-^d) ~ 2 P (^2)/ 2(1) 

CI2-2* (d-2)( k+ d d T 1 2 ) 



2 

_ C 8 2 2- 2 -fc(fc + d-2) Cj(2 + ^) 

_ 2(rf^T) - (d-i) - 2 ° 8 

using the fact that P a b (l) = C^ 1 ) and 2 4 " 1 < fe < 2 i+1 . ■ 

We end this paper with a demonstration of the lower bound for the problem of density estimation. 



4.7 Lower Bound on the Testing Problems 

A standard way of finding the lower bound for the problem of density estimation via the following 
testing problem: 

Proposition 3 Let ip : — > K be bounded such that J s 1 ip(x) dx = and J s 1 ip(x) 2 dx < 

1. Further, let X = {X\, X 2 , . ■ . , A„} fee an independent and identically distributed sample of n 
values from f under the following hypotheses: 

H :f=J-,H 1 :f ' ■ * 



7Vow 7 consider the set of all tests "3/ : [0, 1]" -> {0, 1}. Then: 

inf max Pjj.(* 7^ 7) > — 
* ie{0 ,i} WjV - 60 



Proof. We have the following: 

1 



inf max^ P ff (* ? j) > - inf (E Ho (*) + E Hl (1 - *)) 



> - inf f E Ho (*)+E ff( Y (1-*) • - • l(^- > - 
~ 2 * ^ HoK ' Ho \ K ' 3 VdP Ho " 3 

^(^((.^-.M.i.i^i))) 



>l i-p-„ 



1 



dP 



2 

>3 



> -I 1- -WEI - 1 

6 V 2 V WP^ 
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But using the fact that J Q ip( x ) dx — and J Q ip(x) 2 dx < 1. 



El 



2^ 



+ ldx 



1 



1 



dx] - 1 



< 1 



1 < 0.36 



Putting these two estimates together yields the result. ■ 

We relate the testing problem to that of density estimation in the following way: 

Proof of Theorem [TJ For notational convenience, we only prove this for < t < 1. We will 
use the definitions of global Besov spaces from [T^. Recall that H k (S d ^ 1 ) is the space of spherical 
harmonics of degree k and let n„ = ©fc = o H k (S d ~ 1 ). Then we define the approximation error 
of a function / by polynomials in this class to be: 



E k (f)=miJ\f-P\ 



and the Besov space B 



t 

oooo " 



Definition 4 / € B t ODOO (M) if and only if sup k k t E k (f) < M 

Let 2 l+1 ~ n i/(2t+d-i) an( j cons ic| er ip{x) := ijj^x). It is a polynomial of degree 2 l+1 , so for 
k > 2 4+1 , E k (l + iffi) = 0. Since 1 
polynomial approximation 1 to obtain: 



k > 2 4+1 , E k (l + ^) = 0. Since by Equation El [|Vl|oo < Ci2 i(d -^/ 2 , so we can use the 



supfc<£ fc (l+4=)< 2 ^c x 2^-W < d 



Thus 1 + G S^ooo( c i) f° r & 11 Now, by Theorem 5.5 in 5 , such functions are also in 
C^S"*- 1 ) (assuming < t < 1). Hence: 



liminf inf sup E f n*Z*=*(f(Xi, X 2 , X n ) - /(??)) 



> liminf inf max E 
n fer» /=1 or 1+-^, 



nWT^X!,^,...,^)-/^)) 
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Now we construct a test = l(f{X 1 ,X 2 , . . .,X n ) > 1 + |^). Also, HV^Ib < 1 (Lemma[2]) 
and f S d~i ipir] — 0. Thus, by the Proposition [31 we have: 

{0,1} \ V " 4^ 2^n/' V 4^ / J " 60 



max 



max P |/-/(^)|>p^ > 

/= 1 or i + ^ V 4 V«/ 60 

max E(|/-/fa)|)>^M= 

/=! or 1+4= 240V"- 



Combining everything we have, and using Lemma|5]above that there exists CV such that i/iir/iv) ^ 

-l)/2(2t + d-l) 

2< t! - 1 )/ 2 ' 



G 7 2 " — 07 9 (d-i)/2 1 w e get. 



liminf inf sup E/ 



n^+z=*(f(X 1 ,X 2 ,...,X n )-f( V )) 



> liminf ^=n 2t +j-i 

~ n 240V« 

> eC 7 



240 ■ 2( d -!)/ 2 
which is the bound needed. ■ 
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